Learning Objectives

After completing this lesson, you’ll be able to:

Resources

Exercise

The provincial government has given the city a grant to fund new public art in parks.

Your colleague has created a workspace to analyze the amount of art in each city park, and we are carrying out a code review to ensure that the workspace is efficient and well-designed. In another course, we inspected the workspace to see what it produced, deconstructed the log file, and ran the workspace multiple times to assess the relative performance of each component.

In this exercise, we'll make some specific changes to improve performance.

1) Open and Run Workspace

Open the starting workspace in FME Workbench (2022.0 or later). Turn on feature caching and run the workspace.

2) Check Reader Methodology

First, let's first check if the readers are well-designed and don't read any excess data.

To check the orthophoto coverage, start the FME Data Inspector (it will be easier than the Visual Preview window). You can open FME Data Inspector from the Windows Start menu > FME Desktop > FME Data Inspector. 

Select File > Open Dataset and set the files to open as follows:

Reader Format GeoTIFF (Geo-referenced Tagged Image File Format)
Reader Dataset https://s3.amazonaws.com/FMEData/FMEData/Data/Orthophotos.zip
Parameters > Feature Type Name(s) From File Name(s)

Note

If you find the reading the orthophotos is taking too long, you can download the .zip file, extract it, and select all of the TIF files in the extracted folder. Reading the data locally will be faster.

The Feature Type Name(s) parameter makes sure the Data Inspector lists files by name.

Now select File > Add Dataset and set that dialog up as follows:

Reader Format Google KML
Reader Dataset https://s3.amazonaws.com/FMEData/FMEData/Data/Boundaries/VancouverNeighborhoods.kml

Now you can check whether any GeoTIFF tiles fall outside the extents of the neighborhoods. Because some orthophotos do not overlap the neighborhood boundaries, our workspace is reading more orthophotos than is necessary.

3) Switch GeoTIFF Reader to FeatureReader

It appears that there are GeoTIFF files being read unnecessarily. This causes a performance hit, especially when feature caching is turned on. We could just select the files we want to read, but if the neighborhoods dataset changed, then this list might not be correct.

So, return to FME Workbench.

Delete the GeoTIFF format reader from the workspace. Add a FeatureReader transformer in its place. Make the reprojected Neighborhoods data the Initiator (create a separate connection):

Replacing the GeoTIFF reader with a FeatureReader

Open the FeatureReader parameters. Set the Format to GeoTIFF and set https://s3.amazonaws.com/FMEData/FMEData/Data/Orthophotos.zip as the Dataset. Set the Spatial Filter parameter to Initiator OGC-Intersects Result:

Configuring the FeatureReader

Close the dialog and the FeatureReader now has a GeoTIFF output port. Connect this to the RasterMosaicker:

Connecting the FeatureReader to the RasterMosaicker

If we run the workspace now the FeatureReader outputs 48 features. This is more than we had before (40) and occurs because GeoTIFF tiles that overlap two neighborhoods are being read twice.

So add a Dissolver transformer between Reprojector_2 and FeatureReader. This will consolidate the neighborhoods into a single feature and ensure each GeoTIFF is only read once. Re-run the workspace:

Dissolving fixes reading duplicate photos

Now we only read 27 features, which is the correct amount. We are reading fewer photos and so the workspace is performing better.

4) Check Writer Order

The simplest writer improvement we might make is to change the order of the writers. Currently, the Excel spreadsheet is being written first. This means that GeoTIFF files - which are large in size - are being cached, instead of the smaller Excel file.

So, adjust the order of the writers, so that the GeoTIFF writer comes first in the Navigator:

Changing writer order

Re-run the workspace. Although the workspace wasn't really slow to start with, you should notice it now runs slightly faster, using less memory.

Note

One other writer feature to consider is whether the Excel dataset is being deleted/recreated, or just emptied of data. In theory, it might be (very marginally) quicker to empty the sheets rather than creating the whole spreadsheet from scratch.

To try this, change the writer parameter Overwrite Existing File to No, and change the writer feature type parameter for Truncate Existing Sheet to Yes:

It's unlikely to make a very large performance improvement, but you should consider differences like these when reviewing a workspace for performance. You would especially want to consider this question when writing to a database table with an index (more on that later).

5) Upgrade Transformers

Now let's look into transformers. This is where there are a lot of different changes we can make. The first is to check for old transformer versions. Notice in the Navigator window that two transformers are listed as "Upgradeable":

Upgradable transformers

In turn, right-click each entry and choose to Upgrade Transformer. You will be prompted with a warning that you can ignore, and even choose to skip in the future.

A dialog will open to show the changes in GUI to the transformer, and you can click the Show Changes button to get a written list of changes:

Comparing transformer versions

Upgrading transformers doesn't always make them operate faster - some changes are either functional or cosmetic - and it might make their results slightly different. Therefore, it's not advisable to upgrade all transformers without first checking what the upgrade involves.

However, in this case, both transformers should be safe to upgrade, and they both may even get an improved performance from the upgrade. So go ahead and upgrade both transformers.

Note

The Tester transformer was updated in 2019.1 and can be upgraded to use bulk mode. You'll remember, however, that this might change the order of features emerging from the transformer.

Do you see any part of the workspace that relies on feature order? If not, you can safely upgrade this transformer to its newest version, in order to get a performance boost. By default the new parameter Advanced > Preserve Feature Order is set to Across Output Ports. This preserves feature order but does not allow the Tester to take advantage of bulk mode. To improve performance, change it to Per Output Port.

Learn More

6) Check Transformer Order

Look at the bookmark labeled Prepare Data for Excel Writer:

Prepare Data for Excel Writer bookmark

Inspect the transformers and you will see that data is sorted into order for writing, then unnecessary features are filtered out and unnecessary attributes removed.

This is not the correct order to maximize performance: unnecessary features with unnecessary attributes are being processed by the sorting action. Remember, the key order is Filter-Remove-Action.

So, move the Sorter transformer after the AttributeRenamer:

Moving Sorter to the end

Notice that the Sorter transformer is now flagged as incomplete. Inspect the parameters and you'll notice that the attribute _overlaps no longer exists; it was renamed by the AttributeRenamer to ArtWorks.

So simply change the Sorter to sort by ArtWorks and the transformer will work again.

Note

If the features coming into the Sorter were already grouped by one attribute, we could group by that attribute and sort by another for a performance improvement. But, in this case, that would not work.

The other thing to consider is whether data can be filtered or removed earlier in the workspace. The two filtering transformers in this workspace are the Tester and DuplicateFilter. Can we move these to earlier in the workspace? Can the Tester transformer be moved? Note your answer: you'll need it for the quiz.

Can the AttributeRemover and AttributeRenamer transformers be moved? Well, the AttributeRenamer can't be moved because it renames the _overlaps attribute, an action that can only happen here. However, removing attributes could be carried out much earlier.

The simplest technique is to add an AttributeRemover after every reader whose attributes can be removed:

Adding AttributeRemovers

So, remove the existing AttributeRemover and add an AttributeRemover to each input that can be cleaned, and remove whatever attributes are not necessary to the workspace. You can tell if you remove a necessary attribute if a transformer (or writer feature type) further on in the stream is flagged as incomplete, with a red port on the expanded writer feature type.

Now we have removed all unnecessary attributes from the workspace, as soon as possible.

Note

An alternative solution for database-type formats is to not read the attributes at all. In our workspace, the Excel reader is capable of this. You could open the reader feature type, change the Attributes to Read to Exposed Attributes, and uncheck Name and Title.

If you do this then you do not need the AttributeRemover.

7) Check Transformer Performance Parameters

As mentioned, a number of transformers have parameters specifically for performance benefits. These are often labeled as Group By Mode or <Features First>

Check the transformers in this workspace. The two of particular interest are the Clipper and PointOnAreaOverlayer. Both of them have a Group Processing and a Features First parameter:

Setting Clipper Type and Areas First parameters for performance improvements

The Group Processing parameter doesn't apply, because neither transformer is using a Group By. However, the Clippers First and Areas First options are of interest. If we set these options we can get a performance boost, but we do have to confirm that either Clippers or Areas are the first features to arrive.

In the PointOnAreaOverlayer, change the Areas First parameter to Yes. Re-run the workspace (either turn off caching or use re-run the entire workspace). Notice that all Area features exit as <Rejected> features. They are rejected because they are not first!

PointOnAreaOverlayer rejected features

One reason might be that the MapInfo park features are being read after the Excel records. So, in the Navigator window move the Parks reader to the top of the list:

Changing reader order

Re-run the workspace. Notice that the park features are now first. This part of the workspace should be working more efficiently now.

Try the same action on the Clipper transformer, to see if Clipper features are first so that the Clipper Type parameter can be set to Clippers First.

8) Re-Run Workspace

With Feature Caching off, re-run the entire workspace. Check if the log results show that the workspace is quicker and more memory efficient than it was before. The code review of your colleague's workspace is complete.

Note

Overall, the difference might be very slight. The workspace may not be hugely faster or more memory efficient. However, it is better designed than the original workspace. This makes it more scalable, plus it will help to teach your colleague techniques that might, elsewhere, have a larger effect.